A improved pooling method for convolutional neural networks

The pooling layer in convolutional neural networks plays a crucial role in reducing spatial dimensions, and improving computational efficiency. However, standard pooling operations such as max pooling or average pooling are not suitable for all applications and data types. Therefore, developing custom pooling layers that can adaptively learn and extract relevant features from specific datasets is of great significance. In this paper, we propose a novel approach to design and implement customizable pooling layers to enhance feature extraction capabilities in CNNs. The proposed T-Max-Avg pooling layer incorporates a threshold parameter T, which selects the K highest interacting pixels as specified, allowing it to control whether the output features of the input data are based on the maximum values or weighted averages. By learning the optimal pooling strategy during training, our custom pooling layer can effectively capture and represent discriminative information in the input data, thereby improving classification performance. Experimental results show that the proposed T-Max-Avg pooling layer achieves good performance on three different datasets. When compared to LeNet-5 model with average pooling, max pooling, and Avg-TopK methods, the T-Max-Avg pooling method achieves the highest accuracy on CIFAR-10, CIFAR-100, and MNIST datasets.

performing operations such as selecting the maximum value (max pooling) or calculating the average value (average pooling) within each region.
There are two main types of pooling operations: max pooling and average pooling.Max pooling selects the maximum value within each region as the output, while average pooling calculates the average value.These operations help preserve key features and exhibit a certain degree of translation invariance, making the network more robust to small variations in the input.The pooling layer serves two primary purposes in CNNs.Firstly, it reduces the spatial dimensions of feature maps, reducing computational complexity and shortening training time.Secondly, it extracts essential features and reduces redundant information, thereby improving the model's generalization ability and interpretability.
The max pooling operation selects the maximum value within each region as the output, thereby ignoring the information of other values.This may lead to the neglect or loss of some critical features, especially in cases involving small-sized features.This information loss can affect the model's perception of details.Max pooling preserves the maximum value of the input, but it loses the positional information of that maximum value in the feature map.This means that max pooling has certain limitations in preserving positional invariance.In certain applications, positional information may be crucial for accurate classification or segmentation tasks.Max pooling requires selecting an appropriate region size for the operation.Choosing a larger pooling region can reduce the dimensionality of the feature map and improve computational efficiency but may result in more information loss.Choosing a smaller pooling region can better preserve detailed information but increases computational complexity.
Average pooling takes the average value of pixels within each pooling region, resulting in the loss of fine details in the image.Since multiple pixels are combined into one, the pooled feature map cannot accurately preserve subtle variations and edge information present in the original image.Moreover, average pooling is a spatially invariant operation, meaning that regardless of the position of the feature within the image, the average pooling result will be the same.This may not be desirable in certain cases where positional information is important, such as object detection tasks.Additionally, pooling operations reduce the resolution of the feature map, leading to image blurring.This blurring effect can become more pronounced when using larger pooling regions or stacking multiple pooling layers, ultimately impacting the performance of the model.
Choosing the appropriate pooling type in a CNN model is a challenging decision.It requires a comprehensive understanding of the dataset and making corresponding choices.Another related issue with the pooling layer is how to appropriately consider the characteristics of grouped images during the pooling process.
Selecting the appropriate pooling type is a challenging decision in CNN models.It requires a comprehensive understanding of the dataset and making corresponding choices.Another issue related to pooling layers is how to appropriately consider the characteristics of grouped images during the pooling process.Keeping the loss at a minimum is a key factor in ensuring the success of the model.Therefore, when designing CNN models, it is important to carefully evaluate the characteristics of different pooling methods and make selections based on task requirements and data set features to enhance model performance.Different studies have been conducted in the literature to over come the limitations of pooling functions used in CNNs.Different studies have been conducted in the literature, including a detailed exploration of the limitations associated with using pooling functions.These studies are extensively explained in the literature review section.Hyun et al. 16 introduced a novel pooling method called Universal Pooling (UP).UP performs different pooling functions based on the training samples and is inspired by attention mechanisms.This method can be regarded as a channel-wise local spatial attention module and is distinct from attention-based feature reduction methods.UP has the capability to train complex pooling functions and outperforms previous pooling methods in terms of performance.However, it should be noted that UP entails a significant computational cost.Williams et al. 17 proposed a novel approach called Wavelet Pooling, which introduces wavelet pooling as an alternative to traditional neighborhood pooling in the field of convolutional neural networks.This method involves decomposing the input features into a second-level decomposition and discarding the first-level subbands to reduce the feature dimensionality.By doing so, Wavelet Pooling tackles the issue of overfitting often encountered with max pooling while compacting the features in a more efficient manner compared to neighborhood pooling.Experimental results on standard classification datasets demonstrate its impressive performance.However, it should be noted that Wavelet Pooling involves substantial computational complexity, which may pose difficulties in its practical implementation.
In general, pooling methods proposed in similar studies often suffer from issues such as complexity, lack of flexibility, slow speed, and difficulty of use.Moreover, they may have some drawbacks, including slowing down the training and prediction processes of the model instead of accelerating them.This research aims to provide a new and concise approach to minimize information loss caused by pooling layers.
Inspired by the Avg-TopK 18 method proposed by Cüneyt et al. we incorporate a parameter T to capture the significant features within the pixels with the highest representation and further enhance accuracy.This approach comprehensively addresses the limitations of both max pooling and average pooling.
LeNet-5 19 is a classic convolutional neural network model proposed by Yann LeCun 19 .As shown in Fig. 8, It consists of five different layers, including two convolutional layers, two pooling layers, and one fully connected layer.It was designed for handwritten digit recognition tasks and has been widely used in image classification, edge detection, and object recognition, among other fields.The advantages of LeNet-5 lie in its simple and effective architectural design and parameter sharing mechanism.Through multiple layers of convolution and pooling operations, LeNet-5 can extract local features from images and reduce the amount of data, thereby speeding up training and improving classification performance.This laid the foundation for the development of more complex convolutional neural network models in the future.
Proposed a new pooling method aimed at overcoming the limitations of traditional pooling functions.This new method addresses the problems commonly seen in CNN networks with max pooling and average pooling layers.Our primary objective is to prevent the loss of highly representative values that may be disregarded by traditional pooling methods and ensure these values are appropriately represented.To achieve this, we introduce K pixels with the highest representational capacity and incorporate a learning parameter T to compute the maximum and average values of these pixels based on crucial feature information.Our strategy aims to alleviate the drawbacks of max pooling and average pooling methods while eliminating noise.We conducted experiments on three different benchmark datasets (CIFAR-10, CIFAR-100, and MNIST) to compare this new pooling method with traditional pooling methods.
The proposed pooling method is expected to have a positive impact on the expansion and development of the convolutional neural network field, especially in tasks that require accurate preservation of features.Additionally, by offering an alternative to the drawbacks of traditional pooling methods, the proposed pooling approach aims to contribute to the improvement and advancement of existing techniques in the field.Contributions of the study: • A study on the effects of T-Max-Avg, Avg-TopK, and Max-Avg pooling models on grayscale and color images we conducted.We observed that accurately summarizing these features plays a significant role in improving the performance of the model, especially in tasks such as classification.• Like traditional pooling methods, the T-Max-Avg method is simple, user-friendly, and exhibits good robust- ness.It offers advantages in terms of cost and speed, making it a favorable choice.• The T-Max-Avg method does not impose additional overhead on CNN models.It does not increase compu- tational load while enhancing the robustness of the model.• According to extensive experimental research using different datasets, the T-Max-Avg method has shown higher accuracy compared to Avg-TopK, maximum, and average pooling methods.This indicates that the T-Max-Avg method can more accurately capture feature information and provide better results during the model training process.• The proposed pooling method aims to address the shortcomings of traditional pooling methods and provide an alternative choice for the development and expansion of existing methods in the field.
The remaining structure of this paper is as follows: Chapter 2 provides a brief review of previous work on pooling layers.Chapter 3 discusses the proposed method in detail.Chapter 4 introduces the experimental study and results.Chapter 5 presents subsequent expansion experiments.Finally, the paper concludes with a discussion and conclusion.

Literature review
The following are the pooling methods proposed by researchers as alternatives to traditional algorithms in the field of pooling layers.Bilinear pooling, proposed by Lin et al. 20 , is a commonly used pooling method in deep learning.It extracts the second-order relationships between features by computing the outer product of two input feature maps.This method is able to better capture the spatial relative positions and interactions between features.Lee et al. 21proposed three new pooling methods: hybrid pooling, gate pooling, and tree pooling.Hybrid pooling combines different pooling functions and dynamically selects which pooling operation to apply to each pixel based on the specific requirements.Gate pooling uses gating mechanisms to dynamically determine which pooling function to apply to each pixel.Tree pooling uses a tree structure to partition feature maps and capture hierarchical feature information.These new pooling methods offer more flexible and diverse ways of pooling, enabling more effective extraction of important information from images or features.
Detail preserving pooling 22 , proposed detail preserving pooling (DPP), an adaptive pooling method that preserves important 23 structural details.This method utilizes the concept of inverse binary filters.DPP enables downsizing to focus on critical structural details; learnable parameters control the amount of detail protection.
Stergiou et al. 24 proposed an adaptive exponential weighted pooling method called adaPool.This method learns a region-specific fusion of two sets of pooling kernels based on the Dice-Sørensen coefficient and the exponential maximum.adaPool improves the preservation of details in various tasks such as image and video classification, as well as object detection.One key feature of adaPool is its bidirectional nature, where the learned weights can also be utilized for upsampling activation maps.adaPool consistently achieves good experimental results across tasks and backbone structures.Zhang et al. 25 proposed a novel end-to-end trainable global pooling operator called AlphaMEX Global Pool, which utilizes a nonlinear smooth logarithmic mean exponential function, AlphaMEX, to effectively extract features and enhance network intelligence.Compared to the original global pooling layer, our proposed method improves classification accuracy without the need for additional layers or excessive redundant parameters.Experimental results on CIFAR-10/CIFAR-100, SVHN, and ImageNet demonstrate the effectiveness of this approach.
Lee et al. 26 proposed a graph pooling method based on self-attention.By combining graph convolution with self-attention, their method can simultaneously consider node features and graph topology.To ensure a fair comparison, they conducted experiments using the same training procedure and model architecture for existing pooling methods and their proposed method.The experimental results demonstrate that their method achieves superior graph classification performance on benchmark datasets with a reasonable number of parameters.
One of the latest pooling methods is the vector pooling block (VPB) 27 , proposed by Mohamed et al.The VPB consists of two data paths, primarily extracting features in the horizontal and vertical directions.Unlike traditional pooling layers that use fixed square kernels, the VPB employs longer and narrower pooling kernels, enabling the convolutional neural network (CNN) to gather both global and local features simultaneously.The new Avg-TopK pooling model proposed by Cüneyt et al. 18 selects the top K pixels with the highest interactions and averages them.This model is designed to address the limitations of max pooling and average pooling.

Method
CNN is widely used in computer vision applications.Its main function in these applications is to automatically extract image features.In the convolutional layers of CNN, the input values undergo filtering operations to extract the features of the image.Through these filtering operations, CNN can identify unique features within the image.The output of the convolutional operation is referred to as feature maps.Fig. 1 illustrates the process of applying convolutional operation to a 6 × 6 image data.After the convolutional operation, the output is passed to the pooling layer for further processing.

Pooling layer
In the CNN architecture, there is no direct connection allowing information to flow from the next layer back to the previous layer.This implies that there is no direct communication between the layers.However, for the success of the model, it is crucial to transmit valuable information to the next layer.Our experimental results will compare the average, maximum, and Avg-TopK pooling methods.

Average pooling
Average pooling 28 is a commonly used feature extraction operation that is widely applied in convolutional neural networks.It reduces the spatial dimensions of features by calculating the average value of each small window or region.Specifically, for each small window or region, average pooling computes the average value of all the values within it and replaces the original data with this average value.However, due to the adoption of the average operation, it may not effectively handle subtle feature variations or significant features present in certain image regions.The formula for this method is shown as Eq.(1).
where x is the values of the input image in the pooling region.
Figure 2 shows the 2 × 2 matrix produced by applying average pooling to a 6 × 6 pixel input with a 3 × 3 filter size. (1)

Max pooling
Max pooling is a commonly used feature extraction operation, typically applied in convolutional neural networks.It reduces the spatial dimensions of features by selecting the maximum value within each small window or region.Specifically, for each small window or region, max pooling selects the maximum value and replaces the original data with it.This operation helps to preserve the most prominent features in an image, such as edges and textures.Compared to average pooling, max pooling is better at capturing local feature information.As a result, max pooling 29 is widely used in image processing and computer vision tasks.
In the process of max pooling, only the maximum value within a small window or region is selected as a representative, while discarding other detailed information.This operation leads to partial information loss in the original data and reduces the accuracy of features.The mathematical expression of this method is shown as Eq. ( 2).where x is the values of the input image in the pooling region.
The 2 × 2 matrix obtained when a maximum pooling of 3 × 3 filter size is applied on a 6x6 pixel is shown in Figure 3.

Avg-TopK pooling
The Avg-TopK method aims to eliminate the drawbacks of both max pooling and average pooling methods.It is an approximate value that represents all pixels by selecting the top k highest entries from the input data.The average value of the k pixels with the highest values is determined.Equation.(3) represents the mathematical expression of the Avg-TopK pooling method.
(2)  X represents the set of elements with pixels according to the pool size value selected from the data coming from the convolution layer.Yi represents the i-th highest pixel value.F(Avg-TopK) represents the average of the k highest items.
Figure 4 shows the operations for K = 3 in Avg-TopK pooling operation with a filter size of 3 × 3 from 6 × 6 pixel input.

Methodology suggested for pooling
Both the Avg-TopK method and the Avg pooling method aim to address the drawbacks of the Max pooling and Average pooling methods.In the Avg pooling method, both highly representative values and minimally representative values are treated equally.This ensures that in the presence of values close to zero, the result is close to zero and dominant values do not receive their deserved values.In Max pooling, the highest representative value is selected while ignoring all other values, which can significantly impact classification performance when the highest value is a noisy pixel point.To address these two issues in our proposed pooling model, we take multiple highly representative high values into account and incorporate a parameter T, with its value range between [0 and 1].This parameter controls whether the maximum value or the average value of these highly representative values is taken.Thus, it resolves the issue of input close to zero in the Avg pooling and the problem of noisy pixel points in Max pooling.Moreover, this method can adapt to various image datasets through the parameter T, and it can represent an approximate value for all pixels.
In our proposed method, we continuously select the K highest pixel values from the input data.We incorporate a parameter T to control the output of the average and maximum values of these K highest pixel values.The proposed method is called T-Max-Avg.Eq. ( 4) represents the mathematical expression of the T-Max-Avg pooling method.
X represents the set of elements with pixels selected based on the pool size values obtained from the data from the convolutional layer.Yi represents the i-th largest item, T represents the parameter ranging from 0 to 1, K represents the number of high-value items, and F(T-Max-Avg(X)) represents the final result.
Figure 5 shows the operations for K = 3 in T-Max-Avg pooling operation with a filter size of 3x3 from 6x6 pixel input.
In Figure 6, the results of the average, max, Avg-TopK(K=3) and T-Max-Avg(K = 3) pooling method applied to a 3 × 3 data are shown visually.
These examples demonstrate that the recommended T-Max-Avg method provides values based on both maximum pooling and Avg-TopK.The values obtained from this method can extract the most important information from the image to a great extent.
Figure 7 shows the image transformation based on the selected pooling layer when applying 3 convolutions (1 filter) and 2 pooling layers (LeNet5) to an example image.The pool size is considered to be 3x3.From the figure, it can be observed that the shape of the image changes corresponding to the pixel values based on different pooling methods. (3)

Created CNN network
To evaluate the effectiveness of our proposed pooling layer, we conducted experiments using the same model, dataset, and parameters as the Avg-TopK method.Therefore, we chose the LeNet-5 19 convolutional neural network and a public dataset.LeNet-5 was selected as the preferred option due to its simple structure and powerful classification capabilities.It is a traditional CNN architecture that has significantly contributed to the development of CNNs.The network model was initially proposed by LeCun et al. 19 .The LeNet-5 network structure consists of a total of 7 layers, including 3 convolutional layers, 2 pooling layers, and 2 fully connected layers.Table 1 shows the architecture of the LeNet-5 convolutional neural network.Figure 8 illustrates the CNN architecture used in the experimental study.

Dataset
In this section, we will explain the datasets used to compare the performance of the proposed model with traditional methods.These datasets include publicly available MNIST, CIFAR-10, and CIFAR-100 datasets.

CIFAR-10 dataset
The CIFAR-10 dataset is a classic dataset widely used for computer vision tasks.It contains 60,000 color images divided into 10 different classes, with each class having 6000 images.The classes include airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.Each image has a size of 32 × 32 pixels and consists of RGB channels.This dataset is commonly used to evaluate the performance of models in tasks such as image classification, object detection, and image generation in the field of computer vision.Due to its relatively small scale and diverse categories, CIFAR-10 has become one of the benchmark datasets extensively used in both academia and industry.

MNIST dataset
The MNIST dataset 30 is a classic handwritten digit recognition dataset widely used in machine learning and computer vision research and education.It consists of a collection of samples representing digits from 0 to 9, with each sample being a grayscale image of size 28x28 pixels.
The MNIST dataset contains 60,000 training samples and 10,000 testing samples.The training samples are used to train models, while the testing samples are used to evaluate model performance.These samples are handwritten by different individuals, covering various styles and variations.Therefore, the MNIST dataset is valuable for assessing the robustness and generalization ability of models in digit recognition tasks.

Network parameters
In this section of experimental analysis, all experiments were conducted with the same settings as Avg-TopK.The "man" algorithm and its default parameters were used in the gradient-based optimization algorithm.The pixel values of the images were scaled from 0-255 to 0-1 to normalize the dataset.The model was trained for a total of 50 epochs.Additionally, the stride value and parameters of all other network layers were set to their default values.

Experimental results
Three different datasets and the LeNet-5 CNN model were used for experimental research.The model settings are as described above.To evaluate the performance of the model on grayscale and color images, different datasets were used.In the CNN architecture, pooling layers are commonly used with a scale of 2 and 3.In this study, additional experiments were conducted using a pooling scale of 4, but it was found to have little significance beyond that.
During the experiments with the LeNet-5 model, we used Google Colab, and for the extended experiments, we utilized a device with NVIDIA GeForce GTX 1050.Since the method in this paper selects the same model and parameters as the Avg-TopK pooling method for training, the experimental results obtained using Avg-TopK, max pooling, and average pooling methods are consistent, and there is no need to repeat the experiments.The experimental results were averaged over 6 runs.Following the experience from Cüneyt et al. 18 , this paper adopts a Pool Size of 4 and K equal to 6 on all three datasets to fine-tune the learning parameter T. The fine-tuning results are shown in Figure 12.
In studies conducted with pooling size 2, the accuracy values of the traditional and proposed pooling method on the data sets are shown in Tables 2,3 and 4.
The visual expression of the results obtained in Tables 2,3 and 4 is given in Figure 9. • According to observations, the max pooling method has shown significantly more success than the average pooling method on three different datasets.• The general average value (K = 2) of T-Max-Avg method, as described in the experiments, has shown more success than Avg-TopK and traditional pooling methods.Considering the highest results in the experimental results with Pool size 2, Avg-Top2 method: • It has been determined that the method in this paper improves the average accuracy on CIFAR-10 by 2.45% compared to the Avg-TopK pooling method, by 3.44% compared to the max pooling method, and by 11.45% compared to the average pooling method.• It has been determined that the method in this paper improves the average accuracy on CIFAR-100 by 1.73% compared to the Avg-TopK pooling method, by 1.93% compared to the max pooling method, and by 6.93% compared to the average pooling method.• We have confirmed that the T-Max-Avg method improves the overall average accuracy of the MNIST data- set by 0.2% compared to the Avg-TopK pooling method.It achieves an average increase of 0.12% in overall accuracy compared to the max pooling method, and a 0.84% improvement compared to the average pooling method.
In the experimental study, when the pooling size is 3, we obtained the experimental results of the Avg-TopK, Max, Average, and the proposed pooling models.www.nature.com/scientificreports/www.nature.com/scientificreports/Based on experience, we set the pool size to 4 and the K value to 6 for the CIFAR10 and CIFAR100 datasets, and set the pool size to 2 and the K value to 2 for the MNIST dataset to seek the optimal learning parameter T for the T-Max-Avg pooling method.The accuracy results are shown in Tables 11, 12, 13 and Figure 12 provides a visual representation of the optimization process.
• We have determined that compared to the Avg-TopK pooling method, the T-Max-Avg method improved the average optimal accuracy on the CIFAR-10 dataset by 0.28%.Compared to the Max pooling method, it improved the average optimal accuracy by 4.32%.Furthermore, compared to the Average pooling method, it improved the accuracy by 10.42 • Based on our findings, the T-Max-Avg method demonstrated a noteworthy enhancement in average optimal accuracy when compared to the Avg-TopK pooling method on the CIFAR-100 dataset, with an improvement of 0.53%.Furthermore, it showcased a significant improvement of 4.11% in average optimal accuracy com- www.nature.com/scientificreports/pared to the Max pooling method.Additionally, there was a remarkable boost in accuracy by 6.96% when compared to the Average pooling method.• After conducting our analysis, we found that the T-Max-Avg method resulted in an average optimal accuracy improvement of 0.01% compared to the Avg-TopK pooling method on the MNIST dataset.Additionally, it showed an average optimal accuracy improvement of 0.43% when compared to the Max pooling method.Moreover, there was a 0.44% improvement in accuracy compared to the Average pooling method.
Table 14 displays the top accuracy scores achieved from the experimental studies conducted on the datasets.Figure 13 illustrates the highest scoring outcomes obtained from the LeNet-5 architecture in experimental studies conducted on three distinct datasets, using various pooling methods.From Figure 12, it can be seen that the T-Max-Avg method achieves comparable performance on the MNIST dataset with pool size of 2 and K value of 2 at both T = 0.5 and T = 0.8.Through further experimental comparisons, T = 0.8 is found to be a more ideal learning parameter, as shown in Tables 4, Tables 7, and Tables 10.
According to Table 14, our proposed T-Max-Avg method achieves the highest scores among all datasets.
• In color images, using the traditional pooling method with a pool size of 3 achieves the best score.
• In color images, using the T-Max-Avg method achieves the highest score when selecting a pooling size of 4 and a value of T equal to 0.7.• The T-Max-Avg method demonstrates a more significant improvement on the CIFAR-10 dataset.With a pooling size of 3, K value of 3, and T equal to 0.7, it achieves a performance boost of 1.23% compared to the average top-K approach and a performance improvement of 3.43% compared to the average maximal pooling approach.It exhibits an improvement of 8.83% compared to the average pooling's average optimal value performance.• The T-Max-Avg method can further enhance the accuracy of the Avg-TopK approach on the CIFAR-10 dataset.With a pooling size of 2, K value of 1, and T equal to 0.7, it achieves a performance improvement of 1.4% compared to the average top-K approach and a performance improvement of 1.93% compared to the average maximal pooling approach.It shows an improvement of 7.03% compared to the average pooling's average optimal value performance.• The T-Max-Avg method performs more ideally and successfully compared to the Avg-TopK method on the MNIST dataset.With a pool size of 2, K value of 2, and T value of 0.8, it achieves a 0.1% improvement in average top-K performance compared to Avg-TopK.It also outperforms maximum pooling with a 0.11% improvement in average top-K performance.Compared to average pooling, it achieves a 0.35% improvement in average top-K performance.

Experiment description
To investigate the effectiveness of T-Max-Avg pooling technique in transfer learning models, we conducted a series of experiments.These experiments used the CIFAR-10 dataset and applied the proposed pooling method to the pooling layers of traditional transfer learning models.We did not fine-tune the models but kept the parameters of the pooling layers unchanged to ensure that the transfer learning models had the same number of parameters.As there are no pooling layers in models such as MobileNet 31 , MobileNetV2, and MobileNetV3, we did not study these models.Instead, we selected , VGG19 32 , ResNet50 33 , and ChestX 34 models for experimentation.
In the experiments on VGG19 and ResNet50 transfer models, we set the K value of the T-Max-Avg pooling layer to 3. We found that the VGG19 model was unable to learn from the CIFAR-10 dataset.The ChestX model is a high-performance, low-latency, and high-accuracy convolutional neural network model proposed by Md.Nahiduzzaman et al.To investigate if the T-Max-Avg pooling method improves the accuracy of the ChestX model, we replaced the last max pooling layer with a T-Max-Avg pooling layer, setting the pool size to 2 and K value to 2, while keeping the parameters of the ChestX model unchanged.The model structure can be referred to Figure 14 and Table 15.To further improve the classification accuracy on the CIFAR-10 dataset, we employed cross-validation ensemble for further enhancement, based on the experimental results and experience of J. D. Domingo et al. 35 .We adopted a five-fold cross-validation ensemble, and the model structure is shown in Figure 15.For detailed experimental results, please refer to Table 16.The formula for five-fold cross-validation ensemble is defined as follows: K represents the number of classifiers, each associated with its respective validation slot.x is an input sample, and pi(x) is a vector of output probabilities given by classifier i.This vector consists of probabilities pij(x), where classifier i represents the probability of the sample x belonging to class j. c represents the vector of labels.In Eq. ( 5), soft voting 36 is obtained by accumulating the output probabilities for each class j. wi represents the weight associated with each classifier i, which in our example is 1/k.The argmax function returns the position of the class with the highest cumulative probability.
Hard voting 36 requires binarizing the probabilities, where the class with the highest probability is set to 1 and the rest are set to 0. This step can be implemented using Eq. ( 6).   of 0.8 yielded the highest score.The appropriate learning parameter T may vary depending on the dataset.To determine the optimal pool size, K value, and T value for the dataset implementing this method, a larger range of experimental research is required.When applied to transfer learning models, it has been observed that the T-Max-Avg method achieves more successful results on ChestX, ResNet50, and DenseNet121 models compared to traditional pooling methods.Afterwards, by means of cross-ensemble techniques, the model is further improved to achieve higher accuracy and minimize errors.
The T-Max-Avg method is a combination of the Avg-TopK pooling method and the maximum pooling method.Its design aims to eliminate the drawbacks of traditional pooling methods.In experimental studies, the T-Max-Avg method has been found to be more effective than using the Avg-TopK method alone.It provides a simple, fast, and convenient new pooling method, addressing the preference for traditional methods in deep learning model design.
Our future research will focus on reducing the time complexity of the T-method while maintaining its accuracy.We aim to achieve a win-win effect in terms of both precision and time.

Data availibility
All data generated or analysed during this study are included in this published article.

Figure 7 .
Figure 7. Visualization of T-Max-Avg, Avg-TopK, Max and Average Pooling method on a sample image.

Figure 9 .
Figure 9. Success of pooling methods on datasets for Pooling size 2.

Figure 10 .
Figure 10.Success of pooling methods on datasets for Pooling size 3.

Figure 11 .
Figure 11.Success of pooling methods on datasets for Pooling size 4.

Figure 13 .
Figure 13.Success of pooling methods according to datasets.
CIFAR-100 datasetThe CIFAR-100 dataset is a classic dataset widely used for computer vision tasks.It is an extended version of the CIFAR-10 dataset, consisting of 100 different fine-grained categories.Each category has 600 images, resulting in a total of 60,000 images.These categories cover a wide range of objects, animals, and everyday items such as chairs, tables, flowers, insects, and more.Similar to CIFAR-10, each image in CIFAR-100 has a size of 32 × 32 pixels and consists of RGB channels.

Table 14 .
Highest scoring performances of pooling methods on datasets.